The goal of this script is to develop a workflow diagram showing how the water treatment projects were merged with habitation restoration projects to arrive at our final “restoration activities”.
#install.packages("DiagrammeR")
#install.packages("DiagrammeRsvg")
#install.packages("rsvg")
library(DiagrammeR) #to make the graph
library(DiagrammeRsvg) #to export the graph to an svg object
library(rsvg) #to export the svg object to a png
To make a graph with DiagrammeR, you need to define how each node related to all the other nodes. In this case, we have two types of “nodes”, the variables or datasets we are working with (i.e., habitat restoration activities, or the final restoration activities) and the processes (i.e., specific R scripts) used to create them. Making sure the syntax of the edge statements is correct was tedious, so I make this function:
#1. make a function to help with writing edge statements
define_edge<-function (node_1,node_2){
edge<-paste0("'",node_1,"' ->"," '",node_2,"';")
return(edge)
}
I chose to store the names of the initial and derived datasets in a vector called “varnames” and the names fo the processes used in a vectors called “processes”, so I could access these later in making the diagram.
#set up a list of the names of variables
varnames<-c("Descriptions of\n All PS Treatment\n Projects",
"Completed\nPS \nProjects",
"Completed PS\n Project &\nDescriptions",
"Categorized\n PS Treatment\n Projects",
"Habitat \nRestoration\n Treatment\n Projects",
"Categorized\n Habitat\n Restoration\n Projects",
"Restoration \nActivities")
print("Here are the varnames:")
## [1] "Here are the varnames:"
varnames
## [1] "Descriptions of\n All PS Treatment\n Projects"
## [2] "Completed\nPS \nProjects"
## [3] "Completed PS\n Project &\nDescriptions"
## [4] "Categorized\n PS Treatment\n Projects"
## [5] "Habitat \nRestoration\n Treatment\n Projects"
## [6] "Categorized\n Habitat\n Restoration\n Projects"
## [7] "Restoration \nActivities"
#write a list of process steps
processes<-c("Rscript:\n Select \nCompleted Projects",
"Manual\n Categorization of\n PS Projects",
"Manual\n Categorization of\n Habitat Projects",
"Rscript:\n Merge Restoration\n Projects")
print("Here are the processes:")
## [1] "Here are the processes:"
processes
## [1] "Rscript:\n Select \nCompleted Projects"
## [2] "Manual\n Categorization of\n PS Projects"
## [3] "Manual\n Categorization of\n Habitat Projects"
## [4] "Rscript:\n Merge Restoration\n Projects"
The DiagrammeR package needs the names of the multi-line nodes in a specific format (enclosed in single quotes & separated by semi-colons ;). For example: ‘’Descriptions ofAll PS TreatmentProjects’. So, run a couple of loops to get the variable names and processes correctly formatted:
#create the list of names for the datasets (initial and derived"")
nodes<-c()
for (v in varnames) {
node<-paste0("'",v,"';") #enclose the name from varnames in single quotes, end with a ;
nodes<-rbind(nodes, node)
}
print("Here are the variable node names")
## [1] "Here are the variable node names"
nodes
## [,1]
## node "'Descriptions of\n All PS Treatment\n Projects';"
## node "'Completed\nPS \nProjects';"
## node "'Completed PS\n Project &\nDescriptions';"
## node "'Categorized\n PS Treatment\n Projects';"
## node "'Habitat \nRestoration\n Treatment\n Projects';"
## node "'Categorized\n Habitat\n Restoration\n Projects';"
## node "'Restoration \nActivities';"
proc_names<-c()
for (p in processes){
proc<-paste0("'",p,"';") #enclose the name from processes in single quotes, end with a ;
proc_names<-(rbind(proc_names,proc))
}
print("Here are the process node names")
## [1] "Here are the process node names"
proc_names
## [,1]
## proc "'Rscript:\n Select \nCompleted Projects';"
## proc "'Manual\n Categorization of\n PS Projects';"
## proc "'Manual\n Categorization of\n Habitat Projects';"
## proc "'Rscript:\n Merge Restoration\n Projects';"
The grViz statement we are going to use to define the relationships requires a series of “edge statements” that define how the nodes connect to each other..for example, which variable is input to which process and how the output flows from there. This is where I’ll use the function defined above to make writing the edge statements easier, since they need to start with the node name in single quotes, then include the -> character, and then end with the receiving node in single quotes and ending in a semi-colon (;)
#define the edges
edge1<-define_edge(varnames[1],processes[1])
edge2<-define_edge(varnames[2],processes[1])
edge3<-define_edge(processes[1],varnames[3])
edge4<-define_edge(varnames[3],processes[2])
edge5<-define_edge(processes[2],varnames[4])
edge6<-define_edge(varnames[4],processes[4])
edge7<-define_edge(varnames[5],processes[3])
edge8<-define_edge(processes[3],varnames[6])
edge9<-define_edge(varnames[6], processes[4])
edge10<-define_edge(processes[4],varnames[7])
#now, bind them all together in one vector to use later:
edges<-rbind(edge1,edge2,edge3,edge4,edge5,edge6,edge7,edge8,edge9,edge10)
print("here are the edge specifications:")
## [1] "here are the edge specifications:"
edges
## [,1]
## edge1 "'Descriptions of\n All PS Treatment\n Projects' -> 'Rscript:\n Select \nCompleted Projects';"
## edge2 "'Completed\nPS \nProjects' -> 'Rscript:\n Select \nCompleted Projects';"
## edge3 "'Rscript:\n Select \nCompleted Projects' -> 'Completed PS\n Project &\nDescriptions';"
## edge4 "'Completed PS\n Project &\nDescriptions' -> 'Manual\n Categorization of\n PS Projects';"
## edge5 "'Manual\n Categorization of\n PS Projects' -> 'Categorized\n PS Treatment\n Projects';"
## edge6 "'Categorized\n PS Treatment\n Projects' -> 'Rscript:\n Merge Restoration\n Projects';"
## edge7 "'Habitat \nRestoration\n Treatment\n Projects' -> 'Manual\n Categorization of\n Habitat Projects';"
## edge8 "'Manual\n Categorization of\n Habitat Projects' -> 'Categorized\n Habitat\n Restoration\n Projects';"
## edge9 "'Categorized\n Habitat\n Restoration\n Projects' -> 'Rscript:\n Merge Restoration\n Projects';"
## edge10 "'Rscript:\n Merge Restoration\n Projects' -> 'Restoration \nActivities';"
DiagrammeR has grViz statement that creates a “Graphviz” object, but the specifications are written in something called the DOT language. All of the specifications for how the graph should look are contained in this complicated statement. Also, I wanted to pass the values from my “nodes” and “edges” vectors into the statement. Doing so, would allow me to make changes to the varnames or processes more easily and those changes would then propagate through the code.
The grViz statment is a long string, so to be able to pass the nodes and edges vectors, I wrapped the long statement into a paste0 command and created an object called config_statement, which I will later pass to the grViz statement from DiagrammeR:
config_statement<-paste0("
digraph {
# graph attributes - rankdir = LR makes it left to right rather than vertical
graph [overlap = true, rankdir=LR]
# node attributes
node [shape = box,style=filled,
fontname = Helvetica,
color = cadetblue3]
# edge attributes
edge [color = gray]
",
paste(nodes,collapse=''),
"# node attributes
node [shape = diamond, style=filled,
color = sandybrown,
fixedsize = true,
width = 2.3,
height=1.8]
# edge statements
",
paste(edges,collapse=''),
"
}")
This part is simple, just pass the config_statement created above to the grViz statement from DiagrammeR and create the graph:
#plot the diagram using grViz (from the DiagrammeR package)
flowchart<-grViz(config_statement #end paste0
)#end graphviz
flowchart
Matt magically fixed the problem and figured out how to programmatically export to an svg object and then export that to a png. We need two additional packages for this: DiagrammeRsvg: contains the “export_svg” command to convert to an svg file rsvg: contains the rsvg_png (and other commands for other formats) to convert the svg object to a png.
svg<-export_svg(flowchart) #export to svg file
## Warning in make_context(private$console): '.Random.seed' is not an integer
## vector but of type 'NULL', so ignored
## pre-main prep time: 1 ms
rsvg_png(charToRaw(svg),"restoration_activities.png")